PyDigger - unearthing stuff about Python

Found 123 out of 305,786. Showing 20 on page 1. Total pages: 7.

Name	Version	Summary	date
superoptix	0.1.0b8	Full Stack Agentic AI Framework	2025-08-02 15:48:45
zeroeval	0.6.8	ZeroEval SDK	2025-08-02 06:18:53
openjury	0.1.0	Python SDK for evaluating multiple model outputs using configurable LLM-based jurors	2025-08-01 19:36:43
python-flexeval	0.1.5	FlexEval is a tool for designing custom metrics, completion functions, and LLM-graded rubrics for evaluating the behavior of LLM-powered systems.	2025-08-01 01:20:35
llama-index-packs-rag-evaluator	0.4.0	llama-index packs rag_evaluator integration	2025-07-30 20:54:25
dyff-audit	0.11.1	Audit tools for the Dyff AI auditing platform.	2025-07-30 17:35:43
agenta	0.50.3	The SDK for agenta is an open-source LLMOps platform.	2025-07-29 17:42:14
quotientai	0.4.6	Python library for tracing, logging, and detecting problems with AI Agents	2025-07-29 14:28:52
trajectopy	3.1.2	Trajectory Evaluation in Python	2025-07-29 12:42:26
dyff-client	0.18.0	Python client for the Dyff AI auditing platform.	2025-07-28 18:51:39
pymcpevals	0.1.1	Python package for evaluating MCP (Model Context Protocol) server implementations using LLM-based scoring	2025-07-27 07:17:20
mandoline	0.4.0	Official Python client for the Mandoline API	2025-07-26 20:32:40
SurvivalEVAL	0.4.5	The most comprehensive Python package for evaluating survival analysis models.	2025-07-26 06:19:12
dyff-schema	0.30.1	Data models for the Dyff AI auditing platform.	2025-07-25 17:35:17
evalassist	0.1.20	EvalAssist is an open-source project that simplifies using large language models as evaluators (LLM-as-a-Judge) of the output of other large language models by supporting users in iteratively refining evaluation criteria in a web-based user experience.	2025-07-25 16:44:14
monitoring-rag	0.0.2	A comprehensive, framework-agnostic library for evaluating Retrieval-Augmented Generation (RAG) pipelines.	2025-07-24 11:25:53
novaeval	0.4.0	A comprehensive, open-source LLM evaluation framework for testing and benchmarking AI models	2025-07-22 19:20:41
evalscope	0.17.1	EvalScope: Lightweight LLMs Evaluation Framework	2025-07-21 02:12:56
grandjury	1.0.1	Python client for GrandJury server API - collective intelligence for model evaluation	2025-07-18 05:08:40
rag-evaluation	0.2.2	A robust Python package for evaluating Retrieval-Augmented Generation (RAG) systems.	2025-07-17 08:30:01

Found 123 out of 305,786. Showing 20 on page 1. Total pages: 7.

first prev next last